Benchmark Ab Initio Determination of the Conformers, Proton Affinities, and Gas-Phase Basicities of Cysteine

A systematic conformational mapping combined with literature data leads to 85 stable neutral cysteine conformers. The implementation of the same mapping process for the protonated counterparts reveals 21 N-(amino-), 64 O-(carbonyl-), and 37 S-(thiol-)protonated cysteine conformers. Their relative energies and harmonic vibrational frequencies are given at the MP2/aug-cc-pVDZ level of theory. Further benchmark ab initio computations are performed for the 10 lowest-lying neutral and protonated amino acid conformers (for each type) such as CCSD(T)-F12a/cc-pVDZ-F12 geometry optimizations (and frequency computations for cysteine) as well as auxiliary correction computations of the basis set effects up to CCSD(T)-F12b/cc-pVQZ-F12, electron correlation effects up to CCSDT(Q), core correlation effects, second-order Douglass–Kroll relativistic effects, and zero-point energy contributions. Boltzmann-averaged 0 (298.15) K proton affinity and [298.15 K gas-phase basicity] values of cysteine are predicted to be 214.96 (216.39) [208.21], 201.83 (203.55) [194.16], and 193.31 (194.74) [186.40] kcal/mol for N-, O-, and S-protonation, respectively, also considering the previously described auxiliary corrections.


INTRODUCTION
The studies of gas-phase amino acids date back long time ago, both theoretically and experimentally. Their geometries and vibrational frequencies can be used to identify them spectroscopically, 1,2 especially in interstellar studies, 3,4 while the proton affinities (PAs) and the gas-phase basicities (GBs) can be helpful to study protonation and formation processes. 5−8 PAs and GBs are also used in mass spectrometry processes, since these properties govern fragment formation. 9,10 However, PAs and GBs can also be useful in biochemical applications, since in the knowledge of the solvation energies one can convert these values to liquid state. 11 The smallest amino acid, glycine, underwent many theoretical and experimental studies; 12−21 however, if we move forward to more complex molecules, like the cysteine, the number of publications (and in computations, the applied level of theory) drop(s) rapidly due to the more complicated structures (and the higher number of electrons), which statement holds both for the conformers and for the PA/ GB values. 1,2,30,31,22−29 In 2007, a study by Wilke et al. 26 mapped the complete conformational space at the HF/3-21G level of theory and reoptimized the structures at the MP2/cc-pVTZ level, while the 10 lowest-energy conformers were further optimized at MP2/aug-cc-pV(T+d)Z. In 2013, Freeman 30 took six conformers with the deepest energy, and they submitted them to geometry optimization and frequency calculations with density functional theory (DFT) and at the MP2/6-311+G(d,p) level of theory. Furthermore, they performed CCSD(T) and QCISD(T) single-point energy calculations with the cc-pVTZ basis set. There are recent studies 32−35 using algorithms and machine learning to find conformers, but the finally applied methods are either DFT or MP2. These techniques are based on the exploration of the PES of the target molecule going further than stochastical (Monte Carlo sampling) or deterministic (molecular dynamics) approaches by metaheuristic (e.g., nature inspired algorithms) methods to provide cost-efficient alternatives. For PA and GB, there are a few theoretical studies, the most recent ones are from 2011, 27,29 where the computational level is either based on MP2 with single-point coupled-cluster computations 27 or DFT. 29 Another weakness present in other studies that they only consider the lowest-energy conformers for the protonation and/or only one protonation site.
After our high-level theoretical studies on glycine and alanine, 16,36 here we present an explicitly correlated ab initio investigation on the conformers of the cysteine and its protonated counterparts as well as PA and GB values. From the qualitative side, we map the extended conformational space of the neutral and protonated amino acid; in the latter case, we consider three possible protonation sites. These investigations hopefully reveal new conformers in addition to previous studies for both forms of cysteine. To achieve quantitatively benchmark group (S-protonation) does not make new rotations necessary; however, we cannot profit from symmetry, so we have 6 5 = 7776 initial configurations. We note in advance that based on conformer search for the neutral cysteine, the use of the MP2 method with the 6-31++G** and cc-pVDZ basis sets is advised for mapping the conformational potential of the protonated species.
To distinguish the optimized results, we use the same strategy in both cases and later on too. First, we sort the structures by their energies relative to the minimum and by their A rotational constant. We declare two geometries different if there is a minimum difference of 0.01 kcal/mol in the relative energy and/ or >3 × 10 −4 cm −1 deviation in A; these limits are based on the analysis of the results. Until this point, we do not differentiate transition states and minima, but to do this, we further optimize the conformers and compute the harmonic frequencies using the MP2 method with the correlation-consistent aug-cc-pVDZ basis set. 38 We use MOLPRO 40 and MRCC 41,42 programs to perform the ab initio computations and our python code for data analysis.

Benchmark Structures and Energies.
We subject the 10 lowest-lying minima of the neutral and protonated amino acid to higher level computations with the explicitly correlated coupled-cluster singles, doubles and perturbative triples method 43,44 (CCSD(T)-F12a) with cc-pVDZ-F12 45 (geometry and frequencies (the latter is only for neutral Cys)). Using the cc-pVTZ-F12 and cc-pVQZ-F12 basis sets, we perform singlepoint energy computations. In the case of the cc-pVQZ-F12 basis set, we use both the CCSD(T)-F12a and CCSD(T)-F12b methods, and the latter is utilized for the final benchmark energy computations as F12b is expected to be slightly more accurate than F12a with the large QZ basis set. 40 In addition, we compute the following energy corrections based on the CCSD(T)-F12a/ cc-pVDZ-F12 geometries: The coupled-cluster triples (δT) 46 and perturbative quadruples (δ(Q)) 47 corrections are determined with the 6-31G basis set. Previous investigations 16 compared to smaller (3-21G) and larger (cc-pVDZ) basis sets showed that the 6-31G basis set is sufficient to provide satisfying accuracy for glycine.
We compute all-electron (AE) and frozen-core (FC) energies at the CCSD(T)-F12a/cc-pCVTZ-F12 level of theory 48 and define the core correlation correction as FC methods only correlate the electrons on the valence shells, whereas utilizing AE computations, we correlate the 1s 2 electrons for the C, N, and O atoms and the 2s 2 , 2p 6 electrons for the S atom.
To determine the scalar relativistic effects with computing the second-order Douglas−Kroll 49 (DK) relativistic energies, we use the AE-CCSD(T) 50 method with the aug-cc-pwCVTZ-DK 51 basis set: At the end, we obtain the benchmark electronic (equilibrium) and adiabatic (ZPE corrected) energies by summarizing the cc-pVQZ-F12 single-point energies with the corrections: 2.3. Proton Affinity and Gas-Phase Basicity Computations. The PA and the GB equal the enthalpy (ΔH) and the Gibbs free energy (ΔG) change of the following gaseous reaction: BH + is the protonated conjugate acid, B is the analogue gaseous base, and H + is a free proton. We employ the rigid rotor and harmonic oscillator models in addition to our ab initio computations to determine the PA and GB values. Temperature corrections can be obtained for the translational, vibrational, and rotational enthalpies and entropies applying standard statistical mechanics expressions. Since we have conformer mixtures, we need to calculate the population of each one. We perform this using a simple Boltzmann-distribution: where x i is the relative population of the ith conformer, and ΔG rel, i ○ is the molar standard Gibbs free energy of the ith conformer relative to the most stable conformer.

Conformers of the Neutral and the Protonated
Cysteine. The summary of the systematic conformer search for cysteine can be seen in Table 1 based on the MP2 method with six different basis sets. The internal rotations were mapped by 60°in the corresponding internal torsional angles for every basis set. These initial geometries were optimized, and if there was convergence, we assigned them to a conformer. Finally, we obtain 48,44,39,51,49, and 50 stationary points and 45,41,37,47,45, and 43 minima from the 3-21G, 6-31G, 6-31++G, 6-31G**, cc-pVDZ, and 6-31++G** basis sets, respectively, at the MP2/aug-cc-pVDZ level of theory. We distinguish unique stationary points and minima, which were not described by smaller basis sets. All in all, we got 55 different minimum conformers with these methods. The 6-31++G underperforms both the larger and the smaller basis sets; however, we cannot obtain general information about the usefulness of the diffuse functions since while adding them to the 6-31G** basis set decreases the number of minima from 47 to 43, five unique conformers get located this way. For larger systems or when the goal is not to find all the conformers, 3-21G or 6-31G** can be handy, since their computation cost is quite cheap, while they lead to a high number of conformers (the former is better in the <2.5 kcal/mol region, it provided all of the ones found in present work under this energy limit). For smaller molecules, where it is feasible, it seems that it is worth using even larger basis sets than operated here, but not as a standalone, but as expansions to smaller ones.
Nonetheless, we were not satisfied as we compared our results with the literature. The neutral cysteine has been studied both in the past 22,24,26,29,52,53 and recently, 32,33,35 and there is no absolute agreement in the number of cysteine minima. The most comprehensive study was done by Wilke et al., 26 and they found 71 minimum conformers at the MP2/cc-pVTZ level. To directly compare our results, we reoptimized our structures with the former level of theory and performed vibrational frequency computations to check if the difference origins from imaginary frequencies, but that was not the case. The number of missing conformers was not just simply 71−55 = 16, but 31, since some of our results were not present in their work. It is important to note, that at this level of theory, we had two new conformers with less than 3 kcal/mol relative energy to the global minimum. After this, we tried thinking backward: we reoptimized the missing minima and computed their frequencies at the MP2/ aug-cc-pVDZ level. We could reduce the 31 by one, since two geometries converged to the same structure and the frequencies showed that these are also indeed minima. One difference in the present work is the starting level of theory: we tried the reproduction with HF/3-21G used by ref 26. This way we found 2 of the missing 30, so 28 more were needed. Also, it is worth noting that the rotation with HF/3-21G also resulted in some of the conformers that was not present in the previous work. Another difference was in the resolution of the rotation: previous work did not rotate with 60°uniformly, but with 30°f or the amino group and the sidechain and 120°for the other torsional angles. We also tried rotating the two former parts of the cysteine by not 60°but with 30°systematically at the HF/3-21G level. This produced us only one more of the missing conformers. Finally, we adapted the remaining 27 conformers to our 55 + 2 + 1, meaning we have 85 conformers at the MP2/augcc-pVDZ level. We cannot come to conclusion what causes the difference, we also tried to analyze the structure of the missing conformers, but this did not show any useful information regarding any solution or explanation. The inequality might rise from using different electronic structure packages (different optimization method) or a different initial geometry, which gives the base of this kind of conformer search. The occurrence of each conformer during the conformational mapping with MP2/ 6-31++G** and MP2/cc-pVDZ methods can be seen in Figure  1. In general, if one geometry is more or less likely to be found with one basis set, it is the same with the other, but there are a few exceptions. For example, conformer number 10 was found in 356 cases with the former basis set, while only once with the latter. The geometries which were not located by both of the basis sets are not frequent as they are found in less than 10 times in every case, usually 1 or 2 times. The two exceptions are conformer number 11 and 44, which present 340 and 32 times in By unique, we mean a geometry that was not found with smaller basis set(s).
The Journal of Physical Chemistry A pubs.acs.org/JPCA Article the cc-pVDZ data set, respectively. 6-31++G** found 6, while cc-pVDZ located 10 conformers, which were not identified by the other basis set. As many structures are not so frequent, this finding also indicates that this kind of conformational space mapping can be very sensitive to the initial geometry, leading to the conclusion that there might be some conformers neglected even in the not so high-energy region. The relative energy distribution of the aforementioned conformers can be seen in Figure 2, where we noted the origin of the structures. The highest energy is 10.91 kcal/mol, while the lowest is 1.48 kcal/mol, relative to the global minimum, at the MP2/aug-cc-pVDZ level of theory. Between these two limits, the values are fairly continuous, with three bigger gaps, one at number 68 (0.64 kcal/mol), the middle one at number 71 (0.43 kcal/mol), and the last one at number 83 (0.60 kcal/mol). We also included the origin, i.e., ref 26 or this work, of each conformer in the figure. It seems that the two studies are nice complementary to each other, since there are conformers found by only one of them in practically every energy region.
Protonation tests showed similar results to that in the case of glycine and alanine: the cysteine protonation can happen at every predicted site except the hydroxyl, so we can categorize our structures as N-(amino-), O-(carbonyl-), and S-(thiol-)protonated conformers. In the case of the protonated cysteine, we choose the cc-pVDZ and 6-31++G** basis sets with the MP2 method for the mapping based on the results obtained for the neutral cysteine. These two combined provided all of the cysteine conformers found with our original method, yet they still had structures that were found by only one of them. The final conformers were optimized at the MP2/aug-cc-pVDZ level of theory along with harmonic vibrational frequency computations, and the results were 21, 64, and 37 N-, O-, and Sprotonated conformers, respectively, along with some "byproduct" structures. While the global minimum agrees in our study with previous reports, 29,53 the number of protonated conformers has been greatly increased as compared to 21 in ref 29 or 6 in ref 53. We can explain the relatively high number of the O-protonated conformers by considering the conformational space: in that case we have a new important torsional angle which leads to more stable conformers just like in the case of glycine and alanine. 16,36 The occurrence of the conformers upon the three mapping can be seen in Figure 3. With the initially Nprotonated data set (first column), we only got N-protonated structures, 16 of the final 21. The two basis sets are in good agreement with the number of points leading to the same conformer, and it can be seen that the 6-31++G** found 1 (N17), while the cc-pVDZ found 2 (N4 and N21) minima, that were not present in the other data set with occurrences of 2, 1, and 6, respectively. Many of the originally O-protonated initial structures (second column) converged to N-and S-protonated ones, the former ones are much deeper in energy, while the latter ones mix with the carbonyl-protonated structures in energy (the energy distribution will be discussed later). The aminoprotonated N9, N10, and N11 conformers were located only by this mapping process with the cc-pVDZ basis set from 2, 1, and 2 initial structures (which already seem very small portion, especially if we keep in mind, that in this case we have more than 20,000 points in comparison to the over 2000 initially aminoprotonated structures). We find 7 of the 37 thiol-protonated minima, but none of them are exclusive to this data set. Finally, we find all the final 64 carbonyl-protonated structures. Again the number of initial geometries leading to one minimum is usually in good agreement between the two basis sets, and the minima limited to one of them (9 for 6-31++G** and 8 for cc-pVDZ) are not frequent. The initially thiol-protonated data set (third column) leads to 37 of the 37 S-protonated geometries, also to many amino-(14 of 21) and some carbonyl-(the first 18 of the 64) protonated conformers for the same energetic reasons. Here, we can observe much difference between the two basis sets with respect to the previous searches, and the occurrence is not in a good agreement between them. For example, the lowest conformer, S1 was only located by 6-31++G**, but quite frequently, 245 times (6 unique minima for 6-31++G** and 8 for cc-pVDZ)! From these mapping processes, we can draw the following conclusions: these two basis sets work well with each other as it was found in the case of the neutral cysteine. The rarity of some minima suggests that this kind of search is very sensitive to the initial geometry, and there might be other, yet neglected structures even in the low energy region. The energy  . MP2/aug-cc-pVDZ relative energies of the 85 cysteine conformers discussed in present work. Red means that the given conformer was only located by this work, blue means it was only found by a previous study 26 while gray means it was identified by both studies.
The Journal of Physical Chemistry A pubs.acs.org/JPCA Article distribution of these minima can be seen in Figure 4. The deepest energy belongs to the amino-protonated structures, while comparing the other sites, the carbonyl-protonated minimum has ∼15.6 kcal/mol while the thiol-protonated minimum has ∼25.3 kcal/mol relative energy with respect to the N-protonated minimum and as one can see, the protonation of the different sites "overlap," as the highest N-protonated conformer has higher relative energy than the lowest two Oprotonated ones. Also, from the relative energy levels of the Sprotonation the carbonyl-and thiol-protonated structures mix. This also explains that during the systematic rotations many initial structures converged into a geometry with a different protonation site.

Byproduct Conformers.
Upon searching for the minimum geometries, we found some "byproduct" structures that are not relevant for our work, but they are worth mentioning. MP2/aug-cc-pVDZ harmonic frequency calculations proved that all of them are indeed minima. The relative energies are given with respect to the lowest-energy cysteine conformer for the structures found while mapping the neutral cysteine, and to the lowest-energy N-protonated cysteine for the structures found during the protonated cysteine conformer search.
For the neutral amino acid, we could categorize them into two classes, called "Broken" and "Rearranged" as can be seen in Figure 5. For the former ones, one can observe two bond breaks: an S−C and an H−C, the hydrogen joins to the thiol group and the SH 2 positions itself away from the main molecule. The 23 geometries lie in the 14.86−23.67 kcal/mol relative energy range. Our last cysteine minimum has a relative energy of 10.91 kcal/mol, so there is a gap in-between, but this indicates that one should take extra caution in the event of conformer search as in bigger molecules the difference might disappear. We suggest that it is useful to have basic visual confirmation, or in the case of large data sets, some kind of restriction on the atomic bond lengths to determine, whether we found the target molecule or something else. The other class of the byproducts is the "Rearranged." These geometries at the first glance might look like a thiol-protonated conformer, but the origin of the proton is the carbon atom of the side chain. The 20 conformers exist in the 65.91−76.87 kcal/mol relative energy regime, so these are not so easily mistaken as a neutral cysteine. In agreement with our Figure 3. Occurrence of the different N-, O-, and S-protonated cysteine conformers (first, second, and third rows, respectively) during the mapping of the conformational space with the 6-31++G** and cc-pVDZ basis sets using the MP2 method. The three columns represent the three different mapping process, based on the rotation of an amino-(left), carbonyl-(middle), or thiol-(right) protonated initial geometry. For the protonated cysteine, we found three classes as it can be seen in Figure 6: "Broken," "Rearranged", and "Cyclized," respectively. We have three types of bond breakages differing whether we find SH 2 (Broken 1 ), OH 2 (Broken 2 ), or NH 3 (Broken 3 ) along with the remnant larger molecule fragment. There are 40, 5, and 2 minima in the 8.00−49.34 kcal/mol and 31.82−41.16 kcal/mol relative energy ranges and with 26.76 and 29.97 kcal/mol relative energies, respectively. The gap between the target molecules (protonated cysteine) and the byproducts ceases to exist in this case, so separation based on the number of bonds and their lengths comes handy. One could argue that the system with the OH 2 is not broken, rather hydroxyl-protonated, but the elongated C−O bond (∼2.7 Å) tells us that it is some kind of a molecular complex at best, in accordance with our protonation site tests. We have six Rearranged 1 structures, where a hydrogen atom from the carbon atom of the sidechain changes its position to a possible protonation site. The relative energies are in the 59.45−64.92 kcal/mol region. We found one Rearranged 2 conformer with a relative energy of 25.37 kcal/ mol, where the N-and O-containing ligands switch carbon atoms. There is a new class, called Cyclized since here we observed some kind of ring formation in three different variations. The first one contains a four-atom ring, C−C−C− S, we found one of this type with a relative energy of 20.61 kcal/ mol.

Further Benchmark Structures and Energies.
Applying the Boltzmann-distribution shows us that the structures with higher energy level quickly become negligible for reaching the desired high accuracy for PA and GB. Therefore, we took the 10 conformers (both the neutral and protonated cysteine) with the lowest relative energies (it is important to note that the order of the conformers was determined at the MP2/aug-cc-pVDZ level of theory, since there are multiple changes in the final lineup with the adjustment of the level of theory) and subjected them under further analysis.
The 10 selected neutral Cys conformers can be seen in Figure  7, the 10 amino-protonated in Figure 8, the 10 carbonylprotonated in Figure 9, and finally, the 10 thiol-protonated in Figure 10. The geometries are at the CCSD(T)-F12a/cc-pVDZ-F12 level of theory, the notation of Roman numerals indicates increasing CCSD(T)-F12b/cc-pVQZ-F12 single-point energies while the subscript letter refers to the protonation site. In general, we can say that the structures are usually stabilized by one or more intramolecular hydrogen bond(s), as expected. The N-protonation hinders the acceptor role of the N atom, while the O protonation blocks the same character of the carbonyl O atom. The cysteine global minimum resembles to the II N 16 and II a 36 structures of the glycine and the alanine with respect to the dihedral angles related to the α and β carbon atoms, and it is also stabilized with a hydrogen bond between the lone electron pair    The relative energies, enthalpies, free energies, and auxiliary energy corrections of the neutral cysteine can be seen in Table 2. The relative energy order changes at many places at higher level of theory, coupled-cluster order is noted by Roman numerals, while the MP2 order by Arabic numerals. The difference between the MP2/aug-cc-pVDZ and CCSD(T)-F12a/cc-pVDZ-F12 relative energies can be separated into two sources: one part originates of course from the higher level of theory, while the other part's source is the change in geometry. The latter can be calculated as the difference between the CCSD(T)-F12a/cc-pVDZ-F12 energies at the MP2/aug-cc-pVDZ and CCSD(T)-F12a/cc-pVDZ-F12 structures. Here, this contribution is not more than a few times 10 −2 kcal/mol, so the MP2/ aug-cc-pVDZ structure is quite good. It can also be seen that the MP2/aug-cc-pVDZ and CCSD(T)-F12a/cc-pVDZ-F12 relative energies match within a few 10 −2 kcal/mol, except for the two last conformers, where the changes are −0.37 and +0.15 kcal/mol, respectively. Single-point computations with cc-pVTZ-F12 and cc-pVQZ-F12 basis sets show good agreement between them, correcting the cc-pVDZ-F12 relative energies with 0.01−0.04 kcal/mol, proving the good basis-set convergence. The cc-pVQZ-F12 basis set was utilized both with the CCSD(T)-F12a and CCSD(T)-F12b methods. In general, for cc-pVQZ-F12 basis set, the F12b method is suggested, 40 in relative energies we do not see much difference, as it only reaches 0.01 kcal/mol for conformer III. From the complementary corrections, correlating the inner electrons seems necessary to achieve high accuracy, as the core correlation value can be as high as 0.05 kcal/mol (for conformer X), and in 0.03 kcal/mol on average. The Douglass−Kroll relativistic effects seem to be less important, as the relativistic correction stays below 0.02 kcal/mol (negative value in every case), but usually it is 0.01 kcal/mol. The post-(T) corrections are important, as the sum of the δT and δ(Q) contributions is 0.05−0.06 kcal/mol for most of the conformers, the exceptions are conformer II with −0.01 kcal/mol value (the only negative one), conformer IX, where it can be neglected, and conformer X, where it is 0.04 kcal/mol. The two components (δT and δ(Q)) are similar in most of the cases. In summary, these improvements change the relative energies by 0.05 kcal/mol, but for some cases, the corrections reach 0.08−0.09 kcal/mol. Conformers VI and VII are very close in energy, with a higher level of theory, conformer VI seems to be deeper in energy, but after adding the corrections, VII exchanges with VI in the relative energy order. For the neutral amino acid conformers, we computed MP2/aug-cc-pVDZ and CCSD(T)-F12a/cc-pVDZ-F12 harmonic vibrational frequencies. Considering the computational cost vs the increase in the accuracy, the latter is not advised for protonated amino acids (as it was found out), but we were interested in the order of magnitude of the difference between the two methods, to estimate the accuracy of our final results. In Table 2, we display the MP2/aug-cc-pVDZ zero-point energy correction; on average, they differ from the CCSD(T)-F12a data by 0.08 kcal/ mol. Our computations consider a harmonic oscillator model, and the neglected anharmonicity can have an effect of 0.01−0.1 kcal/mol on the ZPE corrections as shown in previous work for amino acids. 54,55 After the addition of the ZPE corrections, we finally get the 0 K enthalpy values and we can see changes in the order in the case of conformers II−III, VIII−IX, and the difference between VI−VII enlarges. After computing the thermal contributions with the help of statistical thermodynamics, we get the enthalpy values at 298.15 K. Here, we also have changes, as the final order is I, III, II, IV, VII, V, VI, VIII, X, and IX, and the change in the relative enthalpy of conformers IX and X seems to be an outlier. Subtracting the TS term, where T and S denote the temperature and entropy, respectively, we get the free energy values, and the value of this contribution varies for the conformers. The thermal corrections are very sensitive to low frequency vibrations, as mentioned before in the case of    Tables 3, 4, and 5, respectively. The notation is similar to that for cysteine, but the subscript (in CCSD(T)-F12b/cc-pVQZ-F12 order) or the first character (MP2/aug-cc-pVDZ order) refers to the site of the protonation.
The MP2/aug-cc-pVDZ geometries are accurate for the aminoprotonated ones, the geometry effect is 0.02 kcal/mol on average, and for the carbonyl-protonated structures, the average grows larger to 0.06 kcal/mol. In the case of thiol protonation, we find that the geometry effect is much greater, 0.55 kcal/mol in general, while it is more than 1.00 kcal/mol for IX S ! The differences between MP2/aug-cc-pVDZ and CCSD(T)-F12a/        The sum of all of the corrections is 0.06 kcal/mol in general, but it can reach even 0.13 kcal/mol (for VI S and VII S ). With these, we obtain benchmark relative equilibrium energies, and unlike in the case of neutral cysteine, there is no change in energy order. Based on the MP2/ aug-cc-pVDZ harmonic vibrational frequencies, we calculate the zero-point vibrational energy corrections. For the aminoprotonated minima, the sign of the ZPE corrections varies, and the absolute corrections are usually around 0.11 kcal/mol. The carbonyl-protonated geometries have this correction with negative sign in every case, 0.47 kcal/mol on average. For the thiol-protonated conformers, the average ZPE correction becomes 0.79 kcal/mol, and it is a positive value for every one of them. ΔE e + Δ ZPE equals to benchmark adiabatic relative energies, and the order of the minima stays the same except for VI S and VII S as their order is switched. The thermal corrections leading to the 298.15 K enthalpy values are usually a few tenth kcal/mol, except for some outliers like VIII S , where it is over 1 kcal/mol. We finally obtain the 298.15 K Gibbs free energy values considering entropy effects. The amount of these effects varies between 0.01 and 1.34 kcal/mol, and it has negative sign except for VIII S . The uncertainty of the relative enthalpies and that of free energies are similar as in the case of cysteine.

Proton Affinity and Gas-Phase Basicity.
The newly obtained proton affinity and gas-phase basicities with the auxiliary corrections can be seen in Table 6. We included both the 0 K and the 298.15 K values, since the latter ones can be used in practice. We also calculated these quantities for the protonation of the different sites, since if we use a global mixture, the amino protonation dominates the process, overshadowing the carbonyl and thiol protonation. The core correction is contributing more to the benchmark results, since it is 0.10 kcal/mol for the amino protonation, 0.05 or 0.04 kcal/mol for the carbonyl-protonation (if we consider the lowest energy conformers or mixtures, respectively), while it has the value of −0.06 kcal/mol for the thiol protonation. The last of these additions originates from the relativistic effect. It is a positive component in the range of 0.02−0.06 kcal/mol and is the most relevant for the S-protonation. The sum of these corrections is 0.09 kcal/mol for the amino, 0.05/0.04 kcal/mol (minima/mixture) for the carbonyl, and −0.03 kcal/mol for the thiol site. Based on the above correction values, we estimate an uncertainty of about 0.1 kcal/mol for the equilibrium PA values.
From the equilibrium PA values, we can get the enthalpy change of protonation at 0 K. The ZPE effect is roughly −8.5/−8.7; −7.5/−7.7; and −5.5/−5.8 kcal/mol depending on which protonation site we consider (minima/mixtures). The amino protonation Δ ZPE value is lower, than in the case of glycine, 16 but the carbonyl-protonation is almost the same (approximately −9.1 and −7.7 for both minima and mixtures). We convert the value to 298. 15   There are theoretical publications in the literature already, 9,23,25,27,29 but this work has many novelties, if one considers the number of conformers taken into account, the level of theory and corrections, and finally the consideration of the different protonation sites. From the experimental side, one can also find papers, which are not very recent. In 1992, Gorman 56 obtained PA of 214.6 ± 2.7 kcal/mol from Fourier transform ion cyclotron resonance spectrometry experiments. Six years later, Hunter and Lias 57 published an immense database of PA and GB for 1700 species. They collected and critically re-evaluated the state-of-the-art published data. For the cysteine at 298 K, they suggested 215.87 kcal/mol PA and 207.77 kcal/mol GB values, which are in good agreement with our theoretical predictions of 216.39 ± 0.40 and 208.21 ± 1.20 kcal/mol, respectively. Afonso et al. 58 in 2000 performed electrospray ionization (ESI)-ion trap mass spectrometry measurements and obtained 214.41 kcal/mol for the PA of cysteine, although this value might not be considered as reliable as the former ones. 9

SUMMARY AND CONCLUSIONS
We have mapped the conformational space of the neutral cysteine with six different basis sets, namely 3-21G, 6-31G, 6-31++G, 6-31G**, 6-31++G**, and cc-pVDZ using the MP2 method. The two latter bases proved to be the most efficient for the search of possible minimum geometries, combined together, as they both found unique conformers, but if computationally affordable, it might be useful to use even larger basis sets. Compared to the most adequate publication, 26 we finally managed to locate 14 new minima, 85 in total at the MP2/augcc-pVDZ level of theory. The analysis of the occurrences in the data points showed that the search method is very sensitive to the starting geometry, and some conformers can be easily overlooked. We tested the possible protonation sites of cysteine, the thiol, amino, carbonyl, and the hydroxyl group. The protonated hydroxyl group is proven to be not stable. Based on our experience in the case of the neutral cysteine, we searched for the protonated cysteine conformers with the 6-31++G** and the cc-pVDZ basis sets combined with the MP2 method. We found 21 amino-, 64 carbonyl-, and 37 thiol-protonated structures, 122 in total, again at the MP2/aug-cc-pVDZ level of theory in comparison to the 21 reported in the literature. 53 The 10 lowest-energy conformers of each group (the neutral cysteine and its N-, O-and S-protonated counterparts) were subjected for further benchmark investigations: from CCSD-(T)-F12a/cc-pVDZ-F12 geometry optimization up to CCSD-(T)-F12b/cc-pVQZ-F12 single-point energy computations. Core correlation correction, second-order Douglass−Kroll relativistic correction, and post-(T) correction (δT and δ(Q)) were also computed and shown to be necessary to achieve the desired sub-chemical accuracy. With the addition of these, we finally obtained the benchmark ab initio equilibrium relative energies for every conformer, with the Δ ZPE we get the adiabatic relative energies. Applying statistical thermodynamics relative enthalpies and relative Gibbs free energies became available at 298.15 K.
Finally, these results of the cysteine and protonated cysteine yield us benchmark proton affinity and gas-phase basicity values. The protonation of the three different sites was considered, and the results showed that the protonation of the amino group is the most favored thermodynamically. We considered the protonation of the conformer with the lowest energy only and conformer mixtures too. For the PA, we calculated 216.39 ± 0.40 kcal/mol while for the gas-phase basicity 208.21 ± 1.20 kcal/mol. These are in good agreement with previous experimental work showing us, that like in the case of glycine, 16 the relatively simple rigid rotator and harmonic oscillator models can lead to accurate absolute proton affinity values and gas-phase basicities. Hindered rotor or analytical frequency calculations could decrease the uncertainty of our thermodynamical values; nevertheless, the present results can still serve as benchmark references for new investigations.